Upload the dataframes and alignment (in FASTA) to calculate the pairwise genetic distance matrix:
##
## Attaching package: 'seqinr'
## The following objects are masked from 'package:ape':
##
## as.alignment, consensus
## Loading required package: ggplot2
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
## corrplot 0.92 loaded
Calculate the eigenvalues and principal component analysis:
Analysis of quality and contribution of individuals:
Evaluating the PCA by clustering based on metadata information:
## Warning: Removed 41 rows containing missing values (`geom_point()`).
## Warning: Removed 1 rows containing missing values (`geom_point()`).
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
Constructing a dendrogram and making the hierarchical clustering:
library("dendextend")
##
## ---------------------
## Welcome to dendextend version 1.17.1
## Type citation('dendextend') for how to cite the package.
##
## Type browseVignettes(package = 'dendextend') for the package vignette.
## The github page is: https://github.com/talgalili/dendextend/
##
## Suggestions and bug-reports can be submitted at: https://github.com/talgalili/dendextend/issues
## You may ask questions at stackoverflow, use the r and dendextend tags:
## https://stackoverflow.com/questions/tagged/dendextend
##
## To suppress this message use: suppressPackageStartupMessages(library(dendextend))
## ---------------------
##
## Attaching package: 'dendextend'
## The following objects are masked from 'package:ape':
##
## ladderize, rotate
## The following object is masked from 'package:stats':
##
## cutree
library("svglite")
RVCdm<-dist(RVCdistdf, method = 'euclidean')
RVChc<-hclust(RVCdm, method="complete") # simple dendrogram
#evaluating the number of clusters con average silhouette:
fviz_nbclust(RVCdistdf, FUNcluster=hcut, method="silhouette", k.max = 10)
fviz_nbclust(RVCdistdf, FUNcluster=hcut, method="wss")
cut_cmp <- cutree(RVChc, k = 7) #applying the clustering by wss
plot(RVChc, hang=-1, cex=0.2)
#plot with cluster rectangles within the clusters:
rect.hclust(RVChc, k=7, border=2:7)
#save plot of dendrogram:
#svg(filename = "RVC_HC_PCA_euclidean.svg")
#plot(RVChc, hang=-1, cex=0.2)
#rect.hclust(RVChc, k=7, border=2:7) #if I want to see the clusters identified by a rectangle
#dev.off()
Evaluate the association of metadata with the clusters by hierarchical clustering:
##
## Attaching package: 'dplyr'
## The following object is masked from 'package:seqinr':
##
## count
## The following object is masked from 'package:ape':
##
## where
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ lubridate 1.9.2 ✔ tibble 3.2.1
## ✔ purrr 1.0.1 ✔ tidyr 1.3.0
## ✔ readr 2.1.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::count() masks seqinr::count()
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ✖ dplyr::where() masks ape::where()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
##
## Attaching package: 'ggpubr'
##
##
## The following object is masked from 'package:dendextend':
##
## rotate
##
##
## The following object is masked from 'package:ape':
##
## rotate
##
##
##
## Attaching package: 'rstatix'
##
##
## The following object is masked from 'package:stats':
##
## filter
## Group Letter MonoLetter
## 1 1 a a
## 2 2 a a
## 3 3 a a
## 4 4 a a
## 5 5 a a
## 6 6 a a
## 7 7 a a
## Group Letter MonoLetter
## 1 1 a a
## 2 2 a a
## 3 3 a a
## 4 4 a a
## 5 5 a a
## 6 6 a a
## 7 7 a a
## Group Letter MonoLetter
## 1 1 a a
## 2 2 a a
## 3 3 a a
## 4 4 a a
## 5 5 a a
## 6 6 a a
## 7 7 a a
## Group Letter MonoLetter
## 1 1 a a
## 2 2 a a
## 3 3 a a
## 4 4 a a
## 5 5 a a
## 6 6 a a
## 7 7 a a
## # A tibble: 1 × 6
## .y. n statistic df p method
## * <chr> <int> <dbl> <int> <dbl> <chr>
## 1 age 202 15.0 6 0.0201 Kruskal-Wallis
## # A tibble: 21 × 9
## .y. group1 group2 n1 n2 statistic p p.adj p.adj.signif
## * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <chr>
## 1 age 1 2 22 34 366. 0.893 1 ns
## 2 age 1 3 22 20 114. 0.008 0.171 ns
## 3 age 1 4 22 29 250. 0.189 1 ns
## 4 age 1 5 22 37 300. 0.096 1 ns
## 5 age 1 6 22 38 413 0.945 1 ns
## 6 age 1 7 22 22 232. 0.833 1 ns
## 7 age 2 3 34 20 180. 0.004 0.089 ns
## 8 age 2 4 34 29 378. 0.116 1 ns
## 9 age 2 5 34 37 474. 0.076 1 ns
## 10 age 2 6 34 38 653 0.941 1 ns
## # ℹ 11 more rows